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Abstract 

A clustered base transceiver station (BTS) coordination strategy is proposed for a large cellular 
MIMO network, which includes full intra-cluster coordination-to enhance the sum rate-and limited inter- 
cluster coordination-to reduce interference for the cluster edge users. Multi-cell block diagonalization 
is used to coordinate the transmissions across multiple BTSs in the same cluster To satisfy per-BTS 
power constraints, three combined precoder and power allocation algorithms are proposed with different 
performance and complexity tradeoffs. For inter-cluster coordination, the coordination area is chosen 
to balance fairness for edge users and the achievable sum rate. It is shown that a small cluster size 
(about 7 cells) is sufficient to obtain most of the sum rate benefits from clustered coordination while 
greatly relieving channel feedback requirement. Simulations show that the proposed coordination strategy 
efficiently reduces interference and provides a considerable sum rate gain for cellular MIMO networks. 
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I. Introduction 

Multi-antenna transmission and reception (known as MIMO) is a key technique for improving 
the throughput of future wireless broadband systems. For a point-to-point link with multiple 
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antennas at both the transmitter and receiver, it has been shown that the capacity grows linearly 
with the minimum number of transmit and receive antennas, i.e. the number of spatial degrees 
of freedorrQ [2], [3]. Due to space constraints, however, mobile terminals can only have a 
small number of antennas, normally one or two, which bounds the capacity gain promised 
by MIMO. Multi-user MIMO (MU-MIMO), where a BTS communicates with multiple mobile 
users simultaneously, provides an opportunity to boost the sum capacity through joint precoding 
(downlink) or joint decoding (uplink) even when each user has only one antenna [4]. For MU- 
MIMO with a large number of mobile users, however, the sum capacity of both the uplink and 
downlink is restricted by the number of antennas at the BTS, as it determines the number of 
spatial degrees of freedom. 

Although theoretically attractive, deploying MIMO in a commercial cellular system is fun- 
damentally different as the transmission in each cell acts as interference to other cells, and the 
entire network is essentially interference-limited. While the problem of interference is inherent to 
cellular systems, its effect on MIMO is more significant because each neighboring BTS antenna 
element can act as a unique interfering source, thereby making it difficult for the mobile to 
estimate and suppress them. With A^^ receive antennas, each mobile can only cancel/decode up 
to Nr different sources using linear techniques [5]. Furthermore, interference is more severe 
for the downlink because complicated interference suppression techniques are not practical for 
mobile terminals, which need to be power-efficient and compact. Coordination between users 
is usually not allowed. The capacity gains promised by MIMO techniques have been shown 
to degrade severely in the multi-cell environment [6]-[8]. Conventional approaches to mitigate 
multi-cell interference, such as static frequency reuse, sectoring, and spread spectrum, are not 
efficient for MIMO networks as each has important drawbacks [9]. The difficulty in combating 
interference for MIMO is essentially due to the limitation of spatial degrees of freedom, most 
of which are used to suppress the spatial interference introduced by spatial multiplexing at the 
cell site while few are left to suppress other-cell interference. 

Thanks to the fast improvement of processing capability at BTSs and the increase of the 

'in this paper, the definition of the number of spatial degrees of freedom follows [1]. It represents the dimension of the 
transmitted signal as modified by the MIMO channel, and is equal to the rank of the channel matrix when it has full rank. 
Therefore, for a point-to-point link with Nt transmit antennas and Nr receive antennas, it is min {Nt,Nr); for multiuser MIMO 
channels with K users, it is min {Nt, KN,.); for BTS coordination system with B BTSs, it is min {B Nt, KNr). 
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backhaul capacity, coordinated multi-cell MIMO communications with cooperative processing 
among BTSs have drawn significant amount of interest in recent years. The conventional MIMO 
network with single-cell processing forms a MIMO interference channel, whose spatial degrees 
of freedom are determined by the number of transmit antennas at each BTS [10]. With full 
coordination across B BTSs and a large number of mobile users, the coordination system 
forms a virtual MIMO broadcast channel, which increases the spatial degrees of freedom by 
B times. Similar to the transition from single-user MIMO to MU-MIMO, such cooperation 
across multiple BTSs can provide great advantages over single-BTS processing [11]-[14]. This 
paper proposes a BTS coordination strategy with clustered linear precoding for the downlink of 
cellular MU-MIMO systems, which efficiently reduces interference provides a great sum rate 
gain by exploiting the expanded spatial degrees of freedom. 

A. Related Work 

Intercell scheduling, where neighboring BTSs cooperatively schedule their transmissions, is 
a practical strategy to reduce interference, as each time slot only one BTS in each cluster is 
transmitting and it only requires message change comparable to that for handoff. In [15], it was 
shown that one major advantage of intercell scheduling compared with conventional frequency 
reuse is the expanded multiuser diversity gain. The interference reduction is at the expense of a 
transmission duty cycle, however, and it does not make full use of the available spatial degrees 
of freedom. 

Recently, BTS coordination has been proposed as an effective technique to mitigate interference 
in the downlink of multi-cell networks [11]. By sharing information across BTSs and designing 
downlink signals cooperatively, signals from other cells may be used to assist the transmission 
instead of acting as interference, and the available degrees of freedom are fully utilized. In [12], 
BTS coordination with DPC was first proposed with single-antenna transmitters and receivers in 
each cell. BTS coordination in a downlink multi-cell MIMO network was studied in [13], with 
a per-BTS power constraint and various joint transmission schemes. The maximum achievable 
common rate in a coordinated network, with zero-forcing (ZF) and DPC, was studied in [14], 
[16], which demonstrated a significant gain over the conventional single BTS transmission. With 
simplified network models, analytical results were derived for multi-cell ZF beamforming in [17] 
and for various coordination strategies with grouped cell interior and edge users in [18]. Studies 
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considering practical issues such as limited-capacity backhaul and asynchronous interference can 
be found in [19]-[22]. 

With BTSs coordinating for transmission, it forms an effective MU-MIMO broadcast channel, 
for which DPC has been shown to be an optimal precoding technique [23]-[27]. DPC, while 
theoretically optimal, is an information theoretic concept that is difficult to implement in practice. 
A more practical precoding technique for broadcast MIMO channels is block diagonalization 
(BD) [28]-[32], which provides each user an interference-free channel with properly designed 
linear precoding matrices. In addition, it was shown in [33] that BD can achieve a significant 
part of the ergodic sum capacity of DPC. Therefore, we will apply BD in the multi-cell scenario 
as the precoding technique for the proposed BTS coordination. 

Most previous studies on BTS coordination assume a global coordination which eliminates 
inter-cell interference completely. However, in realistic cellular systems, issues such as the 
complexity of joint processing across all the BTSs, the difficulty in acquiring full CSI from 
all the mobiles at each BTS, and time and phase synchronization requirements will make full 
coordination extremely difficult, especially for a large network. Therefore, it is of great interest 
to develop coordination schemes at a local scale, to lower the system complexity and maintain 
the benefits of BTS coordination. For the uplink, an overlapping coordination cluster structure 
was proposed in [34], where each BTS is at the center of a unique cluster and coordinated 
combining is performed to suppress interference for the central BTS of each cluster. With such 
an overlapping cluster structure, each user is in the interior of a cluster and enjoys interference 
reduction, but the cluster number is as large as the number of BTSs and it cannot be easily 
extended to the downlink. In [35], the downlink coordination over a 3-cell cluster was investigated 
with both ZF and DPC, but no inter-cluster coordination was considered. 

B. Contributions 

In this paper, we propose a clustered BTS coordination strategy for the downlink of a large 
cellular MIMO network. With full coordination within the same cluster, the available spatial 
degrees of freedom are greatly increased, which are then used to reduce inter-cluster interference 
and exploit the sum rate gain. This strategy consists of a full intra-cluster coordination and a 
limited inter-cluster coordination. The intra-cluster coordination results in precoding across BTSs 
within the same cluster for MU-MIMO, while the inter-cluster coordination is used to pre-cancel 
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interference for the users at the edge of neighboring clusters. In this way, interferences for both 
cluster interior and cluster edge users are efficiently mitigated. Meanwhile, the system complexity 
and CSI requirements at the BTSs, which are on a cluster scale, are greatly reduced compared 
to global coordination. As the main complexity is at the BTSs, mobile users can enjoy a simple 
conventional receiver. In addition, the universal frequency reuse is applied, and there is no need 
for cell planning. 

We apply multi-cell BD as the preceding technique for such coordination. The precoder 
matrix design is modified from conventional single-cell BD, for which we consider other- 
cluster interference suppression. In contrast to the classical MIMO broadcast channel, the BTS 
coordination system has a per-BTS power constraint. As there is no closed-form solution for 
the power allocation problem with such a power constraint, three different power allocation 
algorithms are proposed. For inter-cluster coordination, we show that there is a tradeoff between 
fairness and sum rate while choosing the inter-cluster coordination area. It is shown that a small 
cluster size (about 7 cells) can achieve a significant part of the sum rate gain provided by the 
clustered coordination while greatly reducing channel information feedback compared to global 
coordination. Simulations show that the proposed coordination strategy improves the sum rate 
over conventional systems and reduces the impact of interference for cluster-edge users. 

The BTS coordination considering two classes of users (edge and interior) was also investigated 
in [18], which derived information-theoretic results based on a simplified Wyner-type circular 
network model. In this paper, we consider a more practical setting-a large tesselated 2-D network. 
We propose clustered coordination based on low-complexity linear preceding, design parameters 
for such coordination and demonstrate the achievable performance with simulation. We have 
made some idealized assumptions in this paper, such as perfect information about channel state 
and interference. We demonstrate through simulation that the coordination system is sensitive to 
imperfect channel knowledge. The full investigation of these practical issues is, however, left to 
future work. 

C. Organization 

The rest of the paper is organized as follows: In Section n, we make necessary assumptions, 
and describe the proposed coordination strategy and the received signal model. Section HI 
presents the preceding matrix design for multi-cell BD. The detail for inter-cluster coordination 
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and the associated parameter design are stated in Section IV. Numerical results are presented in 
Section V and conclusions are drawn in Section VI. 

II. System Model 
A. Clustered MIMO Network Structure 

Consider a cellular MEMO network, where both BTSs and mobile users have multiple antennas, 
Nt and A^,., respectively. The system parameters used in this paper are summarized in Table I. 
We consider a large network, i.e. the number of cells in the network is very large, so it is 
impractical to do coordination across all the BTSs. We propose to divide the network into a 
number of disjoint clusters, where each cluster contains a group of adjacent cells, as in Fig. \T\ 
With coordination among the BTSs within the same cluster, we effectively increase the number 
of spatial degrees of freedom, which will be used to suppress interference, including inter-user 
and inter-cluster interference, and provide sum rate gain. 

We apply universal frequency reuse, so the users at the cluster edge may suffer a high degree 
of interference from neighboring clusters. To efficiently accommodate all the users, we group 
them into two classes: cluster interior users and cluster edge users. A discussion about user 
grouping will be given in Section IV. To do the proposed clustered coordination, we make 
several assumptions. 

Assumption 1: The BTSs within a cluster have perfect CSI of all the users in this cluster, and 
perfect CSI of the edge users in the neighboring clusters. 

For a time-division duplexing (TDD) system, the BTS can obtain the downlink CSI through 
direct uplink channel estimation due to channel reciprocity. For a frequency-division duplexing 
(FDD) system, the downlink CSI can be obtained by feedback from mobile users, and limited 
feedback for MU-MIMO is an ongoing topic [36]-[38], which we will not explore in this paper 
and perfect CSI is assumed. The assumption of the availability of CSI of the edge users is based 
on the fact that for handoff such users have CSI of multiple neighboring clusters and can feed 
back such information. The full CSI of the users in the same cluster is for MU-MIMO precoding 
to cancel the inter-user interference. The CSI of the edge users in the neighboring clusters is for 
pre-canceling the inter-cluster interference for these users. 

Assumption 2: The BTSs within the same cluster can fully share CSI and user data. The BTSs 
in different clusters can exchange traffic information, such as the number of active users and 
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user locations. 

The capability of full coordination of the BTSs within the same cluster enables doing MU- 
MIMO precoding across all the BTS antennas in this cluster. The limited coordination between 
BTSs in different clusters can be used for scheduling, e.g. the cluster with a large number of 
active users may not do inter-cluster coordination for the neighboring clusters. 

Assumption 3: BTSs within the same cluster are perfectly synchronized in time and phase, and 
different propagation delays from these BTSs to mobile users in this cluster are compensated. 

This assumption is to ensure synchronous reception from the home BTSs at mobile users. It 
is difficult to realize perfect synchronization in practice, and the investigation of the impact of 
asynchronous reception is out of the scope of this paper. Recently, there has been some study 
on this subject [22]. 

From these assumptions, the system requirements for clustered coordination are based on a 
cluster scale, which is much lower than that for global coordination, especially in a large network. 

B. Coordination Strategy 

Based on the clustered structure and assumptions in the last section, we propose a clustered 
coordination strategy, including full intra-cluster coordination and limited inter-cluster coordina- 
tion. The transmission strategies for different user groups are described as follows. 

Cluster interior users: BTSs within the same cluster work together as a "super BTS" to 
serve the interior users in that cluster with MU-MIMO precoding. In this way, there will be 
no intra-cluster interference, i.e. inter-user interference, for these users. In addition, the interior 
users are protected to a large degree from inter-cluster interference by path loss. 

Cluster edge users: Multiple neighboring clusters have channel information of edge users, 
and they coordinate for the data transmission: one of these clusters is selected to act as the 
home cluster to transmit data to such a user, and other neighboring clusters will take this 
user into consideration when designing precoding matrices. With pre-cancelation of intra-cluster 
interference provided by the home cluster and pre-cancelation of inter-cluster interference at 
other neighboring clusters, there will be no interference for this edge user from those clusters. 

With such a coordination strategy, the interference for both cluster interior and cluster edge 
users are efficiently mitigated. Fractional frequency reuse (FFR) is another technique for interfer- 
ence management where BTSs cooperatively schedule users in different downlink bandwidths. 



8 



However, FFR is a frequency-domain interference management technique. The proposed co- 
ordination strategy is a spatial domain technology that can be implemented with a universal 
frequency reuse. For a highly-loaded system, FFR alone cannot accommodate all the edge users. 
Networked MIMO offers another opportunity to serve them. 

C. Received Signal Model 

Without loss of generality, we consider the cluster c. The N^xl received signal vector at the 
kth user in the cluster c is given as 



K C B K(^'> 



6=1 6=1 i=l,i^k c=l,c^c i=l 

^ ' ^ >> 



desired signal intra-cluster interference inter-cluster interference 

(1) 

where 

• x['^ is the /fc X 1 transmitted vector for user k in cluster c. Denote x'^'^^ = [x^'^-'* X2'^'* 
x^"**]*, where * denotes the conjugate transpose of a matrix. The covariance matrix for x^'^) 
is denoted as Q(^) = E[x(=)x('=)*]. 

• H^'^'*'^ is the Nr x Nf channel matrix from BTS b in cluster c to user k. 

• T^'^'^^ is the Nt x Ik precoding matrix for user k at the 6th BTS in cluster c. 

• n^jf' is the additive white Gaussian nose at user k in cluster c, with zero mean and variance 
E(ni=)nr)*) = all^^. 

Because the B BTSs within this cluster coordinate to work as a super BTS, the signal model 
can be written as 

= E T^^xf) + nf + X: Hf f: Tf xf (2) 

?=1 c=l,c^c j=l 

where H^j^^ = [H^'^'^^ H^'^'^\ ■ ■ ■ , H^'^''^''] is the x NfB aggregate channel transfer matrix from 



the super BTS to user k, and 



rp(c) _ rrj^(c,l)* rp{c,2)* _ _ _ rp(c,B)*j* 



is the aggregate transmit precoder for user k over all B BTSs. Unlike traditional downlink with 
co-located MIMO channels, the channel gains from any two antennas at different BTSs are 
guaranteed to be independent. 
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Denote t}^' = iif' + YL^=\cTtc^t^ SjLi '^'^f^^f the sum of the noise and interference 
from other clusters, the covariance matrix of which is 

c=l,c^c j=l 

= ^nliv.+ E EHfTfQfTf Hf*. (3) 

c=\,c^c j=l 

Assumption 4: The interference plus noise covariance matrix is perfectly known at the mobile 
users and BTSs in the same cluster. 

This covariance matrix can be estimated at mobile users by various methods, including the 
usage of silent period of the desired signal [39], the usage of pilot signal [40] and blind estimation 
[41] according to multiple access strategies. After such estimation, each user will feed back it 
to the BTS, which will be used to design preceding matrix. 

III. Clustered Multi-cell BD 

In the proposed coordination strategy, both cluster interior and cluster edge users are served 
by multi-cell BD with pre-cancelation at the "super BTS". BD is a linear preceding technique 
for downlink MU-MIMO systems, and single-cell BD has been well studied [28]-[32]. A major 
difference between multi-cell BD and single-cell BD is the power constraint. While single-cell 
BD has a total power constraint (TPC), each BTS in the cluster has its own power constraint, 
so multi-cell BD has a per-BTS power constraint (PBPC). In this section, we will design the 
clustered multi-cell BD, which can be separated into two parts: the preceding matrix design 
and the power allocation design. The design of the preceding matrix will consider ether-cell 
interference (OCI) and follow the algorithm proposed in [32], which combines interference 
whitening at the receiver and a statistical OCI-aware precoder at the transmitter to reduce OCI 
and is shown to provide better sum rate performance than conventional BD. For the power 
allocation, three different algorithms will be proposed for PBPC. 

A. Precoding Matrix Design 

(c) 

To suppress ether-cell interference, we apply an A^^ x Nr whitening filter at the receiver 

(c) 

for each user, which is shown to be related with R]^ as [32] 
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With this whitening filter, the received signal for user k after post-processing is 

1=1 i=l 
" (c) (c) (c) (c) (c) (c) 

where H^''^ = W^^^H^''^ and z]^' = W^^'z^^' are equivalent channel matrix and noise vector. 

Based on the equivalent signal model in dH), we can get the precoder for multi-cell BD. First, 
construct the aggregate interference matrix for user k in cluster c as 

Hr = [Hr ■■■ Hit; •■■ Hr]*. (5) 

(c) ~ ic] ic] 

The principle idea of BD is to find the precoding matrix ' such that H^^T^^ = 0, which 

ic) ~ ic) 

means there is no inter-user interference. Thus ' lies in the null space of \ A sufficient 
condition for the existence of a nonzero effective channel matrix for user k, tit^Tf, is that 
at least one row of H^^^^ is linearly independent of the rows of H^f-* [42]. This introduces the 
constraint that the number of total transmit antennas (BNt) is no smaller than the number of 
total receive antennas (KNr). Therefore, there is a constraint on the total number of users that 
can be served simultaneously in each cluster [29], [30], specified as follows 1^: 

Lemma 1 (User constraint for multi-cell BD): For a clustered MIMO network with B BTSs 
per cluster, the maximum number of users that can be supported simultaneously in each cluster 
by multi-cell BD is bounded by 

K < 

-'^max 



EN, 



where [a;J is the maximum integer less than or equal to x. 



Assuming K < 



BNt 

Nr 



, we describe the precoding matrix design as follows. Let / 



k 



/ ~ ic)\ ~ ic) 

rank(H^^), and denote the singular value decomposition (SVD) of H^^ as 

"-k ^k k,i ^ k,o\ ■> 

~ ic) ~ ~ ic) ~ 

where [ contains the first Ik right singular vectors and q contains the last BNt — h right 
singular vectors. Therefore, V^.^ forms a null space basis of 'H.^^^, from which we can get T^,'^''. 
In this paper, we assume the number of spatial streams for each user is Ik = Nj.. If Ik < or 
there are extra transmit antennas, additional optimization can be done by picking the appropriate 
precoder subset [44] or doing coordinated beamforming [43]. 



^If antenna selection or a decoding matrix is applied at the mobile user, it is possible to support more users than this bound 
[30], [43], which we will not consider in this paper. 
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(c) 

With the derived , the received signal becomes 

r-W _ TT (c)rp(c) (c) ^(c) 

Denote T^'^ = [Tj"'^^ T^^'''^ • • • T^^'^^] as the submatrix associated with BTS b. Then the 
transmit power constraint for each BTS is 

Tr(f ['^Q(^)t['^*) < P. 

The achievable sum rate per cell for the clustered multi-cell BD is then given by [32] 



RcBD = ^ , max 1 log, I^. + lli'^T^^^Q^^^T^^^'ui'^* 

Tr(T<"^Q(-)T<"'*)<P ^ ^ 

where Q^*^^ is the covariance matrix for x^'^'', and [Qf''*, Q,'^^*, ■ ■ ■ ; Q^^*]* = Q*^'^-'- 
Denote the SVD of the effective channel H^f'xJ^'^ as 



(6) 



* k ' 



^k 



where A^^ = diag(Afc,i, ■ ■ ■ ,Afc,,J, and = rank(H^'^Ti'^^). Let A^^) = blockdiag{AS'\ 
A^^}. Then the sum rate can be written as 

RcBD = max ^log, | I + A^'^^Q^^) A^^)* | 

Tr(T<'='Q(<:)T("'*)<P ^ 

where Q^'^) = V(^)*Q(^)V(^), and V(^) = blockdiag{vf\ ■ ■ ■ , vj^^}. 



(7) 



B. Power Allocation with PBPC 

For the power allocation with PBPC, we propose one optimal and two sub-optimal schemes: 
user scaling and scaled water- filling. Both the optimal scheme and user scaling scheme are convex 
optimization problems, and the scaled water-filling scheme is modified from the conventional 
water-filling power allocation algorithm. 

1) Optimal Power Allocation: The optimal power allocation matrix to maximize (|7]) is a diag- 
onal matrix [2], denoted as Qqpt = diag(7i,i, 71,2, ■ ■ ■ , 7i,«i) 72,i, ■ ■ ■ , Ti^./x)- The corresponding 
achievable sum rate is given as 

K Ik 

Tr(T(^)QWT(^'*)<p5tttr 
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The power constraint can be rewritten as 

K h 

-(c,fe)||2 



k=l 1=1 

where t^^f ^' is the Ith column of T^*^'^^. 

Thus, the optimal power allocation problem with PBPC can be formulated as 

K Ik 

RcBD = max r XlXl + ^M^^^.O 
^ k=i 1=1 



subject to 



EtiEti lltif P7.,<P,6=1,...,S 
lk,i >0,l = l,---,lk,k = l,---,K. 
For this optimization problem, the power constraints for different users are coupled. Similar 
problems with per-antenna power constraints have been studied in [45], [46]. To the best of 
our knowledge, no efficient algorithm as water-filling for power allocation problems with per- 
antenna or per-BTS power constraints is available at this point. The objective function, however, 
is concave and the constraint functions are linear, so this is a convex optimization problem 
and can be solved numerically, e.g. with the interior-point method [47]. However, with a large 
number of users, and multiple transmit and receive antennas, it is quite complex to solve this 
optimization problem, and we propose two sub-optimal schemes in the following sections. 

2) User Scaling (US): One sub-optimal power allocation scheme is user scaling, for which 
we weight the precoding matrix for each user, by choosing Q^g — blockdiag(/xilLj, " ' ■> 

fi^Iij^), where fik is to scale the precoding matrix of the kth user to meet the power constraint. 

There are several reasons for doing this. First, with fewer weight terms it reduces the complex- 
ity for solving the optimization problem compared with the optimal scheme. Second, for each 
user an equal power allocation only results in a negligible capacity loss compared to the optimal 
water-filling, especially at high SINR, and with shadowing the power allocation across users 
plays a more important role than across streams of each user. Third, user scaling makes it easy 
to adjust transmit power between different users, for example, to meet a fixed rate constraint. 
Denote uj^^'^^ = HT^'^'^^HI,. The optimization problem for the user scaling scheme is 

Rus = max ^ X! X! log (l + ^l,i^k) (9) 
^ k=i 1=1 



subject to 
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Hk>0,k = !,■■■ ,K. 
Again, this is a convex optimization problem. 

3) Scaled Water-Filling (SWF): As it is difficult to get an efficient algorithm to solve dS]) and 

we propose another sub-optimal scheme based on the water-filling algorithm. 
First, consider a multi-cell BD system with TPC, whose sum rate is given by 

Rtpc = max ilog^ | I + A^'^^QfJ.^.A^^)* |. (10) 

The optimal power loading matrix Q^p^- = '^^'^^ is derived by water- filling [29]. To meet PBPC, 
we scale this matrix and choose Q^swf = I^Qtpc- scaling factor /i G (0, 1) is given by 

P 



^ max,=i,2,..,BTr(T^^^Q^^^^T^ , 
Therefore, the sum rate per cell is given by 

RswF = -^logal I + /iA^'^S(^)A(^)* |. (11) 

C. Scheduling Schemes 

From Lemma [7J there is a constraint on the maximum number of users a multi-cell BD 
system can support simultaneously. Therefore, with a large number of users in each cluster, it is 
necessary to schedule transmission for a subset of users, according to some performance criterion. 
The sum rate optimal scheduling algorithm is to exhaustively search over all the possible user 
combinations and pick the user set which maximizes the chosen performance metric, which is 
extremely complicated. We propose to use a sum rate based sub-optimal user selection algorithm 
inspired by [48], which has low complexity and approaches optimal performance. 

Let U and S denote the sets of unselected and selected users respectively, and denotes the 
performance metric for user k. The proposed user selection algorithm is described in Table II. 
This is a greedy algorithm. In each step, one user is selected from the un-selected user set which 
adds the maximum performance gain, and the process stops when no more user can be added 
or the performance metric begins to decrease. We consider two different kinds of scheduling, 
maximum sum rate (MSR) and proportional fairness (PF), for different scenarios. 
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IV. Inter-cluster Coordination 

With the proposed coordination strategy, BTSs within a cluster serve their interior users with 
multi-cell BD, while the neighboring clusters coordinate with each other to serve edge users. It 
is possible for multiple BTSs to transmit data to an edge user, but for simplicity we consider that 
each user is served by one cluster. In this section, we will describe inter-cluster coordination in 
detail, and investigate two important system parameters: coordination distance and cluster size. 

A. Inter-cluster Coordination with Multi-cell BD 

The main idea of inter-cluster coordination is to do interference pre-cancelation at all the 
neighboring clusters for the active edge user, and select one cluster to transmit information 
data to this user. The precoding technique used in this paper for inter-cluster coordination is 
multi-cell BD, the same as for intra-cluster coordination. Each edge user selects a cluster based 
on the channel state, denoted as the home cluster, and feeds back this decision, while the other 
neighboring clusters act as helpers for the data transmission. The remaining clusters are interferer 
clusters. Different kinds of clusters and inter-cluster transmission are illustrated in Fig. [2l 

For the following discussion and simulation, we focus on a home cluster and assume that 
when this cluster schedules an edge user, the neighboring clusters of this edge user will always 
help. This will happen if there are a small number of users in each cluster so that there are spare 
degrees of freedom at neighboring clusters. With a large number of users, joint scheduling across 
clusters is required. While we leave the full investigation of such a scheduling problem to future 
work, we propose a simple two-step approach: first, each cluster does scheduling within its own 
cluster, and the scheduled edge users inform the neighboring helper clusters; in the second step, 
each cluster deals with the requests from edge users in the neighboring clusters, and it selects 
to help some of these users while drops some scheduled users of its own. After this scheduling 
process, each cluster designs precoding matrices. 

To the home cluster, there is no difference between the edge user and interior users, and the 
BD precoding matrix is designed as in Section III. For helper clusters, the precoding matrix 
design will be different. Without loss of generality, we consider the precoding matrix design at 
the helper cluster ci for the edge user k^, which is served by its home cluster cq. Denote 

fcl - i^fcl ^fco J ' 
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where H^*^^^^ is the aggregate interference matrix of user ki to all the other active users in the 
cluster ci as in Q, and h[,'^^^ = w[,^°''h[.^^'* is the effective channel after whitening filter from 
the cluster ci to the edge user k^. 

To pre-cancel the interference for both the edge user /cq and other active users in the cluster 
Ci, the precoding matrix T^j^^^ should satisfy the condition li^^^^T^j^^^ = 0, i.e. it should lie in 
the null space of which can be designed with SVD of H^i^^^ in the same way as in Section 

III. Similar to Lemma 1, there is a constraint on the number of users that can be supported 
simultaneously in the helper cluster, stated as follows: 

Lemma 2 (User constraint for the helper cluster): For a helper cluster with B BTSs and ke 
edge users to help, the maximum number of users that can be supported simultaneously by 
multi-cell BD in this cluster is bounded by 

BNt 



kp,. 



Nr 

Therefore, to serve an edge user with inter-cluster coordination, the total number of users the 
network can support will be reduced, which induces a tradeoff between mitigating interference 
for edge users and maximizing the total throughput. This makes the choice of the inter-cluster 
coordination area important. Actually, the user constraints in Lemma 1 and Lemma 2 are due to 
the constraint on the total spatial degrees of freedom in each cluster, determined by the cluster 
size and the number of transmit antennas at each BTS. To serve an edge user all the neighboring 
clusters need to provide a certain number of degrees of freedom, which leaves fewer degrees of 
freedom to serve their own cluster interior users. 



B. Inter-cluster Coordination Distance 

In this section, we present one method for grouping the users into cluster interior and cluster 
edge users, which will be employed in our simulations to illustrate our algorithms' performance. 
Our proposed metric is based on the channel model in this paper, which includes Rayleigh 
fading, shadowing and path loss, and omnidirectional antennas. With this model, users near the 
cluster edge will have low signal power and high interference on average, and require inter- 
cluster coordination. Therefore, we do user grouping based on user locations, and determine an 

^The active users in a cluster are the users currently being served. 
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inter-cluster coordination area by the coordination distance, which is defined as follows and 
illustrated in Fig. [2l 

Definition 1: Coordination distance, D^, is the boundary between interior and edge users. If 
the distance of the user to the cluster edge is no larger than Dc, this user is classified as a cluster 
edge user; otherwise, it is a cluster interior user. 

In a real implementation, this grouping could be performed based on average signal strength 
measurements (as employed in the handoff algorithm for example). We defer development of 
measurement based approaches, however, to future work. 

Naturally there is a tradeoff when choosing Dc. If Dc is large, more users will be treated 
as edge users and enjoy a substantial interference reduction, but the total throughout will be 
reduced as the total number of active users will be reduced. To balance fairness to edge users 
and the total sum rate, we will investigate the mean minimum rate and effective sum rate, as a 
function of Dc. 

Mean Minimum Rate: Suppose that the mobile users are randomly distributed within each 
cluster. For a given Dc, for each realization of user locations, denote RmmiDc) as the minimum 
rate among all the users in the cluster. Mean minimum ra?cl, Rmin{Dc), is the mean value of 
RminiDc), which is mainly determined by the edge users and will increase as Dc increases. 

Effective sum rate: As the edge user is served by multiple neighboring clusters, effectively 
its rate is shared by those clusters. If there are Nc^i clusters serving user i, which is decided by 
its location and Dc, then the effective rate of this user for each coordinating cluster is Ri/Nc^i, 
where Ri is given as follows according to ^ 



(12) 



The effective sum rate for each cluster is defined as 



^ Rk 

Rsum{Dc) = / .^^TTTTT' ^^^^ 

^ l\c,k[^c) 

which will decrease with the increase of Dc as more users become edge users. 

For a home cluster, if all the users are interior users, then the effective sum rate is the 
conventional sum rate for this home cluster; if there is an edge user in this home cluster served 



''other similar performance metrics regarding tlie fairness to tlie edge users can also be applied, e.g., the achievable rate at a 
certain outage probability. The results, however, will not change. 
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by Nc clusters, only l/N^ of this user's rate is counted into the effective sum rate of each serving 
cluster, including the home cluster. Therefore, the effective sum rate is the same for each cluster 
in a homogeneous network. 

According to these definitions, Rmin{Dc) and Rsum{Dc) characterize the opposing objectives 
of fairness to edge users and total sum throughput. We propose to use a utility function, U{Dc), 
to evaluate the effect of Dc on both Rmin and Rsum- 

Definition 2 (Utility Function U{Dc)): The utility function U{Dc) is defined by 

U{Dc) = a = — — + (l-a) Tjr^,0<a<l. (14) 

mSiXD,Rmin{lJc> maXD, Rsum{J-^c) 

where a is a variable reflecting the design objective. If it is more valuable to care about edge 
users, we can pick a — > 1; if sum rate is more important, we can pick a ^ 0. As an example, 
we pick a = 1/2, which means we treat relative changes of Rmin and Rsum as of equal value 
to the system. 

Simulation results of U (Dc) for Dc G [0, R) @ are shown in Fig. [3l with B = 3, R = 1 km and 
K = 30, and interference-free SNR at the cell edge is 18 dB. Totally 1000 realizations of user 
locations are run, and for each realization 1000 iterations are simulated with independent channel 
state. PF scheduling is used to provide fairness, and the scaled water-filling power allocation 
is used for computational efficiency. From the results we can see that the maximum value is 
achieved around D^. = 0.35/?, which will be a proper choice. 

Inter-cluster coordination may be designed for criterions other than Rmin and Rsum, but the 
idea of making a good tradeoff between the fairness for the edge users and the sum rate persists. 

C. Cluster Size 

With a fixed D^, if the cluster size is small, the relative coordination area is large and there 
will be too many cluster edge users which will consume lots of the degrees of freedom and lower 
the effective sum rate. Alternatively, a large cluster size will have a relatively small coordination 
area, which has small sum rate loss. However, the requirement of full CSI and synchronization 
will prohibit a very large cluster size, and due to path loss the users benefit little from those 



^When Dc = R, the area around the BTSs is classified as inter-cluster coordination area, which is not going to be the case as 
the nearby BTS can provide a high SINR for the users in this area. Therefore, we only consider Dc £ [0, R) in this simulation. 



18 



BTSs far away. Therefore, to select a suitable cluster size is important for practical systems, 
which is also the motivation to propose the clustered coordination. 

1) Sum Rates for Different Cluster Sizes: Fig. |4] shows the effective sum rates per cell for 
different cluster sizes, 5 = 1,3, 7, 19, Dc = 0.35R, R = 1 km, and interference-free SNR at the 
cell edge is 18 dB. We can see that there is a diminishing gain with the increase of the cluster 
size: the 3-cell cluster has a much higher sum rate than the 1-cell cluster, and a 7-cell cluster 
has a rate gain about 2.5 bps/Hz over a 3-cell cluster, while from B = 7 to B = 19 the sum 
rate increases about 1 bps/Hz. The lower sum rate for B = 1 is due to its relative large edge 
area. Therefore, a 7-cell cluster can already achieve a significant part of the performance gain 
of the clustered coordination. 

2) CSI Feedback Reduction: The CSI requirement for clustered coordination is on a cluster 
scale, which is greatly reduced compared to global coordination. With C clusters and B cells 
each cluster, totally there are BC BTSs in the network. For global coordination, the effective 
channel matrix for each user is x BCNt, while for clustered coordination it is A^^ x BNt. 
Therefore, we get the following lemma: 

Lemma 3 (CSI Reduction): For a cellular network with N^eii cells, the amount of CSI feedback 
for clustered coordination with cluster size B is jP-^^ of that for global coordination. 
The amount of CSI feedback for a 7-cell cluster system is only ^ of that for a 19-cell cluster, 
while the performance of the 7-cell cluster system does not degrade much as shown in Fig. |4l 
so a cluster size of 7 is a reasonable choice for clustered coordination with the given transmit 
power. 

V. Numerical Results 

In this section, the performance of the proposed coordination strategy is shown via monte 
carlo simulation. We choose the number of antennas to be = 4 and Nr = 2. The standard 
deviation of shadowing is 8dB, the path loss exponent is 3.7, and the cell radius is 1 km. Other 
than stated, the interference-free SNR at the cell edge is 18 dB, accounting for path loss and 
ignoring shadowing and Raleigh fading. We assume all the BTSs in other clusters transmit at 
full power. Mobile users are uniformly distributed within each cluster, and they are associated 
with clusters based on locations. 
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A. Sum Rates for Different Systems 

First, we consider sum rates for different systems with maximum sum rate (MSR) scheduling. 
Besides the proposed multi-cell BD systems, we also compare with the following systems. 

• Multi-cell DPC with Total Power Constraint (TPC): This is an upper bound for the downlink 
channel of multi-cell systems. We assume a total power constraint. DPC is applied across 
BTSs within the same cluster, and algorithm 2 in [49] is used for power allocation. 

• Multi-cell BD with TPC: This is similar to the single-cell BD, and the water-filling algorithm 
can be applied to the aggregated channel for power allocation. This serves as an upper bound 
for multi-cell BD with PBPC, and can indicate the capacity loss due to the PBPC. 

• TDMA with Intercell Scheduling [15]: Neighboring BTSs cooperatively schedule their 
transmission, and only one BTS is active to serve one user at each time slot. 

• Intercell Scheduling with BD: Compared to TDMA with intercell scheduling only, this 
technique allows one BTS to serve multiple users at each time slot with BD. 

Fig. \5\ compares sum rates for different systems. There are several key observations. 

1) The sum rates of multi-cell BD systems are much higher than that of the TDMA system 
with intercell scheduling, and are pretty close to that of DPC. 

2) All multi-cell BD schemes have about the same performance. 

3) There is only a marginal rate loss of PBPC to TPC. 

B. Distribution of User Rates 

Fig. [6] shows the cumulative distribution function (CDF) of mean rates for users. There are 
30 users uniformly distributed in the cluster, B = 3 and Dc = 0.35-R, and PF scheduling is 
applied. The simulation setting is similar as that for Fig. [3l We run 100 realizations for user 
locations, and for each realization 1000 iterations are simulated with independent channel state 
and the mean rates are stored. Totally, there are 3000 samples of user rates, with which we can 
plot the CDF. For example, the rate with 10% outage for intercell scheduling is 0.4 bps/Hz, for 
intercell scheduling with BD is 0.1 bps/Hz, and for clustered multi-cell BD with and without 
inter-cluster coordination are 0.6 and 0.8 bps/Hz, respectively. For multi-cell BD with inter- 
cluster coordination, nearly 60% users have mean rate larger than 1 bps/Hz and 10% users have 
mean rate larger than 2 bps/Hz, while for intercell scheduling only less than 5% of users can 
have mean rate larger than 1 bps/Hz. 
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C. Imperfect Channel Knowledge 

Pilot symbols are required for channel estimation, and such training overhead becomes greater 
for a larger cluster size. However, there will be inevitable estimation errors, and the simulation 
results accounting for imperfect channel knowledge are shown in Fig. |7l The channel estimation 
model in [31] is used. At the BTSs, the available knowledge of the small-scale fading channel 
matrix of the /cth user is given by h[,'^'^'' = H^'^'^'' + E^'^'^'', where H^.'^'^^ is the true channel matrix 
and E^'^'''^ is the channel error. Entries of E^.'^'^'' follows i.i.d. complex Gaussian distribution with 
zero mean and covariance cr|^5.^/2 per real dimension. The channel knowledge error is denoted 
as MSE= lOlog^o'^MSE dB. To demonstrate the impact of imperfect CSI, we assume equal 
MSB for each user. The unequal MSB case is left to future work. We can see that the sum rates 
for BD systems decrease as MSB increases, while TDMA system with intercell scheduling is 
not sensitive to channel error, but the sum rates of multicell BD systems are always higher for 
the simulated range. This is due to the imperfect inter-user interference cancelation with channel 
error for MU-MIMO systems, and such interference is from the same propagation channel as 
the information signal, so it will greatly degrade the performance. Therefore, robust precoding 
schemes are required in practical systems. 

VL Conclusion 

In this paper, a clustered BTS coordination strategy is proposed to increase the available 
spatial degrees of freedom for MIMO networks, and thus to reduce interference and increase 
the sum rate. A cluster structure is formed, and the users are grouped into cluster interior users 
and cluster edge users, served with different coordination strategies. Cluster interior users are 
served with intra-cluster coordination, i.e. multi-cell BD, while cluster edge users are served by 
multiple neighboring clusters to reduce inter-cluster interference. The precoder for multi-cell BD 
and system parameters for inter-cluster coordination are designed. It is shown that a small cluster 
size (such as 7) is enough to provide the benefits of the clustered coordination while greatly 
reducing the amount of channel feedback. Numerical results show that the proposed coordination 
strategy can provide robust sum rate and edge user rate gains. 

There are many practical issues associated with clustered BTS coordination, requiring much 
future work. Compared with global coordination, the cluster structure reduces the amount of 
CSI required at the BTS, but with multiple antennas at both the BTS and mobiles, the amount 
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of CSI is still daunting. Current schemes are sensitive to synchronization and CSI error, which 
is expected to increase in a cluster system, so robust precoding schemes are needed. In this 
paper, we have assumed that all users have perfect knowledge about other-cluster interference. 
The investigation of the imperfect interference estimation is of practical importance and is a 
worthy topic of future work. Generally, the analysis of cellular MIMO networks is an open 
problem, given the randomness of user locations, path loss, and matrix channels with fading and 
shadowing. 
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TABLE I 

System Parameters 



Symbol 


Description 


P 


the maximum transmit power at each BTS 


B 


number of BTSs in each cluster 


C 


number of clusters we consider 


K 


number of users per cluster 


Ik 


length of data symbol for user k 


Nt 


number of transmit antennas at each BTS 


Nr 


number of receive antennas at each mobile 


R 


radius of each cell 




coordination distance 
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TABLE II 
User Selection Algorithm 

1) Initially, set 5 = and W = {1, 2, • • • , K}. Set Cold = 0. 

2) While |5| < K and |5| < ^ 

a) for every k £U 

i) S = S + {k}. 

ii) Calculate C„e™ = Z^sgi /s- 

iii) if Cnew > Cold, set Coid = C„ew, and fc = fe. 

b) Let S = S + {k},U = U-{k}. 



Cluster 




Fig. 1 . An example of the clustered network, with B — 7. Node "C" in each cluster is the virtual controller, which means full 
coordination within each cluster. The dashed line between controllers in neighboring clusters denotes the limited coordination 
between these clusters. 
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Fig. 2. An example of inter-cluster coordination, B — 3. Ci is the home cluster, C2 is the helper cluster, and C3 is the 
interferer cluster. Solid lines denote transmissions of information signals and dotted lines are interference, and the cross on the 
dotted line means that the interference is pre-canceled. 
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Fig. 3. U{Dc) for different Dc, R ^ 1 km. 
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Fig. 4. Effective sum rate per cell with different cluster size, B = 1,3,7,19, R — 1 km, and Dc = 0.35ii. The standard 
deviation of shadowing is 8dB, the path loss exponent is 3.7. 




K (# of users) K (# of users) 

(a) Different K (b) Small K 

Fig. 5. Sum rate per cell for different systems, with cluster size B = 3. "OPT" denotes the optimal power allocation scheme, 
"US" denotes the user scaling scheme, and "SWF" denotes the scaled water-filling scheme. "DPC TPC" is the multi-cell dirty 
paper coding with total power constraint, and "TDMA" is the opportunistic intercell scheduling. 
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Fig. 6. CDF of the rates for users in tiie cluster, B = 3, Dc — 0.35i?. 




Fig. 7. Sum rates for different systems with imperfect chaimel knowledge. 



